The paper helicopter represents one of the most elegant and accessible physical models for teaching and demonstrating the principles of Design of Experiments (DOE). This simple yet effective experimental system, consisting of folded paper with adjustable rotor dimensions and optional weight attachments, provides a hands-on approach to understanding how multiple factors simultaneously influence a measurable response. The helicopter’s flight time serves as an ideal response variable, as it can be measured precisely and is influenced by aerodynamic principles that students can intuitively grasp.
In experimental design, researchers must choose between comprehensive data collection and experimental efficiency. A full factorial design explores every possible combination of factor levels, providing complete information about main effects, interactions, and higher-order relationships. For a 2^3 design with three factors at two levels each, this requires 2^3 = 8 unique treatment combinations. When replicated three times for statistical robustness, this yields 24 experimental runs. The full factorial approach guarantees that no information is lost and all possible factor interactions can be independently estimated.
However, full factorial designs can become prohibitively expensive or time-consuming as the number of factors increases. A fractional factorial design offers an efficient alternative by strategically selecting a subset of treatment combinations that still allows estimation of the most important effects. A 2^{3-1} fractional factorial uses only half the runs (4 treatment combinations, or 12 runs with three replicates), significantly reducing experimental effort. This efficiency comes at a cost: aliasing, where certain effects become confounded with others and cannot be independently estimated. The key assumption is that higher-order interactions are often negligible compared to main effects and two-factor interactions.
The trade-off between comprehensive information and experimental efficiency represents a fundamental challenge in DOE. While the full factorial design provides unambiguous results, the fractional factorial design requires careful consideration of which effects are likely to be important and which can be safely confounded. In many practical applications, this trade-off is essential for making DOE feasible within resource constraints.
Research Question: How do the main effects and interactions of rotor length, rotor width, and paper clip mass influence the flight time of a paper helicopter, and can a fractional factorial design efficiently identify the same optimal configuration as a full factorial analysis?
Materials and Methods
Experimental Materials
Primary Materials
Paper substrate: Standard A4 office paper (80 g/m², 210 × 297 mm)
Thickness: 0.1 mm (±0.01 mm)
Paper clips: Standard steel paper clips
Dimensions: 50 mm length × 10 mm width
Mass: 0.50 g (±0.02 g) per clip
Cutting tools:
Steel ruler (300 mm, ±0.5 mm accuracy)
Precision craft knife with replaceable blades
Cutting mat (A3 size)
Measurement Equipment
Primary measurement device: Stopwatch - Resolution: 0.01 seconds - Accuracy: ±0.02 seconds over 10-second intervals - Calibration: Verified against laboratory-grade timer before experiments
Figure 1: Hanhart stopwatch used for timing helicopter flight duration
Experimental Setup and Apparatus
Release System
Drop height: 8.20 m (±0.02 m) - Measured from helicopter center of mass to floor level
Release mechanism: - Operator holds helicopter by the base (non-rotor section) - Helicopter oriented vertically with rotors horizontal - Release performed by simultaneous opening of thumb and forefinger - Drop initiated with zero initial velocity
Environmental controls: - Indoor setting - Air conditioning off during experiments to minimize air currents - Room temperature: 22°C (±2°C)
Flight Termination Criteria
Landing definition:
First contact between any part of the helicopter and the floor surface.
Contact detection: Visual observation by trained operator.
Timing stops at moment of first contact, not final rest position.
Paper Helicopter Construction Protocol
Step-by-Step Construction
Figure 2: Paper helicopter construction template showing Wing A, Wing B, and fold sections X, Y, Z
Template preparation:
Print template on A4 paper using laser printer
Verify dimensions using steel ruler
Mark fold lines clearly with pencil
Cutting sequence:
Cut along solid lines using craft knife and steel ruler
Maintain consistent pressure for clean edges
Verify rotor dimensions with calipers after cutting
Folding procedure:
Fold rotors along designated lines to create 90° angles
Ensure rotors are mirror images (one left, one right)
Fold body sections as indicated to create base structure
Paper clip attachment:
Attach clips to the bottom-most fold of the helicopter body
Position clips symmetrically to maintain balance
Secure clips by folding paper around them (no adhesive used)
Factor C: Paper Clip MassM_C = \begin{cases}
0 \text{ clips} & \text{(low level, coded as -1)} \\
2 \text{ clips} & \text{(high level, coded as +1)}
\end{cases}
Statistical Design and Analysis Plan
Full Factorial Design
The experiment employed a 2^3 full factorial design with 3 replicates, resulting in 24 total experimental runs. The design structure is defined as follows:
The 24 total runs were performed in completely randomized order to minimize the effects of systematic bias, learning effects, and environmental drift during the experimental session.
Fractional Factorial Design Simulation
To evaluate the efficiency of fractional factorial designs, this study includes a 2^{3-1} half-fraction design simulation using a subset of the full factorial data. The fractional design was constructed using the design generator C = AB, which selects 4 unique treatment combinations from the original 8.
Resulting Alias Structure:\begin{align}
I &= ABC \\
A &= BC \\
B &= AC \\
C &= AB
\end{align}
This aliasing structure means that main effects are confounded with two-factor interactions, requiring the assumption that interactions are negligible relative to main effects.
Null Model and Hypothesis Framework
The null model serves as the statistical baseline for hypothesis testing in experimental design. It represents the simplest possible explanation for observed data variation, assuming that experimental factors have no systematic effect on the response variable.
Null Hypothesis Statement
The null hypothesis (H₀) for this paper helicopter experiment states:
H₀: μ₁ = μ₂ = μ₃ = μ₄ = μ₅ = μ₆ = μ₇ = μ₈
Where μᵢ represents the true mean flight time for each of the eight treatment combinations in the 2^3 factorial design. This hypothesis asserts that rotor length, rotor width, and paper clip mass have no effect on helicopter flight time, and any observed differences between treatment means result solely from random experimental error.
Null Model Specification
The null model assumes that all variation in flight time can be attributed to random error:
Y_{ij} = \mu + \varepsilon_{ij}
Where: - Y_{ij} = flight time for the j-th observation in the i-th treatment combination - \mu = grand mean flight time across all experimental conditions
- \varepsilon_{ij} = random error term, assumed \varepsilon_{ij} \sim N(0, \sigma^2)
Under this model, the best predictor of any individual flight time is simply the overall experimental average, regardless of factor settings.
Statistical Testing Framework
The Analysis of Variance (ANOVA) tests the null hypothesis by comparing the null model against the full factorial model. The test statistic evaluates whether observed treatment differences exceed what would be expected under random variation alone:
F = \frac{\text{Mean Square for Treatments}}{\text{Mean Square Error}}
Rejection of H₀ (F-statistic significantly greater than 1) provides evidence that at least one factor significantly affects helicopter flight time, justifying progression from the null model to the full factorial analysis.
Statistical Analysis Methods
The experimental data was analyzed using Analysis of Variance (ANOVA) with a significance level of \alpha = 0.05 to identify statistically significant effects. Following the ANOVA, a linear regression model was developed based on the significant effects to predict helicopter flight time as a function of the experimental factors.
Model adequacy was assessed through comprehensive residual analysis, including:
Normal probability plots to verify the assumption of normally distributed residuals
Residuals vs. fitted values plots to check for homoscedasticity and model adequacy
Outlier detection using standardized residuals and leverage analysis
Measurement System Analysis (MSA Type 1)
A Measurement System Analysis Type 1 study was performed to validate the adequacy of the manual stopwatch timing method. This study quantified the measurement system’s precision (repeatability) and accuracy (bias) by isolating the stopwatch timing error from helicopter flight variability, ensuring that observed experimental effects represent true helicopter performance differences rather than measurement artifacts.
Experimental Protocol
Reference Value Establishment: A reference time of 4.150 seconds was selected to match the typical flight time of the optimal helicopter configuration (A+ B- C-) identified in preliminary testing. This reference value represents the “true” time that the measurement system aims to capture accurately.
Measurement Procedure: The primary experimenter (single operator throughout all trials) performed 50 consecutive timing trials using the following protocol:
A digital timer (smartphone stopwatch application with millisecond precision) was used as the reference standard
Both the digital timer and the Hanhart stopwatch were started simultaneously
When the digital timer reached exactly 4.150 seconds (indicated by visual display or audible signal), the operator stopped the Hanhart stopwatch
The stopwatch reading was recorded immediately
The difference between the stopwatch reading and the reference value (4.150 seconds) was calculated as the measurement error for that trial
The process was repeated 50 times in a single session to maintain consistency
Tolerance Specification
The tolerance represents the acceptable range of variation for the measurement system relative to the process being measured. Rather than arbitrarily selecting a tolerance value, this study adopted a target-driven approach based on Six Sigma methodology, which specifies that measurement systems should achieve a Capability Index (Cg) of at least 2.0 for ideal performance.
The tolerance was calculated backwards from the target Cg value using the relationship:
T = \frac{C_g \times 6 \times \sigma}{0.2}
where σ is the standard deviation of the 50 stopwatch measurements. This approach ensures that the tolerance specification is appropriate for the observed measurement system variability while targeting industry-standard capability levels.
Capability Indices
Two standard capability indices were calculated to assess measurement system adequacy:
Potential Gage Capability (Cg): This index quantifies measurement precision (repeatability) by comparing the measurement system’s variation to a percentage of the tolerance:
C_g = \frac{0.2 \times T}{6 \times \sigma}
where T is the total tolerance and σ is the standard deviation of the 50 measurements. A value of Cg ≥ 1.33 indicates adequate precision, with Cg = 2.0 representing Six Sigma ideal performance.
Gage Capability with Systematic Error (Cgk): This index accounts for both precision and accuracy (bias):
where Bias = x̄ - x_true, with x̄ representing the mean of the 50 measurements and x_true = 4.150 seconds. A value of Cgk ≥ 1.33 indicates adequate combined precision and accuracy.
Acceptance Criteria: Both Cg and Cgk must be ≥ 1.33 for the measurement system to be considered adequate. Values of Cg ≥ 2.0 represent excellent (Six Sigma level) capability.
The statistical constant K = 0.2 (representing 20% of the tolerance) is used in both calculations as a standard fraction that balances measurement system capability against practical tolerance requirements.
Results
This section presents the statistical results from the experimental data. The findings are reported objectively without interpretation, following the structure of the analysis plan. First, the results from the full 2^3 factorial design are presented, followed by the analysis of the simulated 2^{3-1} fractional factorial design.
Measurement System Analysis (MSA Type 1)
Prior to analyzing the factorial experiment data, the stopwatch timing method was validated through MSA Type 1 to ensure adequate precision and accuracy.
Table 1: Summary statistics from MSA Type 1 study (n=50 stopwatch trials)
MSA Type 1 Summary Statistics
Statistic
Value
Number of Trials
50
Reference Value (Phone Timer)
4.150 s
Mean Stopwatch Reading
4.180 s
Standard Deviation
0.1340 s
Minimum Reading
3.8 s
Maximum Reading
4.5 s
Range
0.7 s
Measurement Accuracy and Precision
Table 2: Bias analysis showing measurement accuracy relative to digital reference
Accuracy Assessment
Metric
Value
Systematic Bias
0.0300 seconds
Bias as % of Reference
0.72%
Absolute Mean Error
0.1180 seconds
Interpretation
Negligible bias - measurements well-centered
Table 3: Gage capability indices calculated from stopwatch timing trials
MSA Type 1 Capability Indices
Index
Description
Formula
Value
Criterion
Status
Cg
Precision (Repeatability)
(0.2 * T) / (6 * sigma)
2.00
>= 1.33
Adequate
Cgk
Precision & Accuracy Combined
((0.2 * T) - |Bias|) / (3 * sigma)
3.93
>= 1.33
Adequate
Tolerance Specification: To achieve the target capability of Cg = 2.0 (Six Sigma ideal), the calculated tolerance is ±4.020 seconds (total tolerance = 8.041 seconds). This tolerance was derived from the observed measurement variability and represents the acceptable range for stopwatch timing error relative to the specified capability target.
MSA Type 1 Visualizations
Figure 3: Run chart showing all 50 stopwatch readings against the phone timer reference (4.150s). The horizontal lines indicate the reference value (dashed black) and mean stopwatch reading (solid red).
Figure 4: Distribution of measurement errors (stopwatch reading minus reference value). The histogram shows the frequency of positive (late) and negative (early) timing errors.
Figure 5: Capability indices compared to acceptance criterion (1.33) and Six Sigma ideal target (2.0). Both indices exceed minimum requirements.
Full Factorial Design Analysis
The analysis was conducted on the complete dataset of 24 runs from the full factorial experiment.
Table 4: Descriptive statistics for the 24 flight times from the full factorial experiment
Descriptive Statistics of Flight Time (seconds)
N
Mean
Std Dev
Minimum
Median
Maximum
24
3.205
0.521
2.32
3.22
4.18
Table 5: Mean flight times for all eight treatment combinations
Treatment Combination Means (Ranked by Performance)
A_RotorLength_Factor
B_RotorWidth_Factor
C_PaperClip_Factor
Mean Time (s)
Std Dev
n
8.5cm (High)
3.5cm (Low)
0 clips (Low)
4.147
0.031
3
8.5cm (High)
5.0cm (High)
0 clips (Low)
3.510
0.131
3
7.5cm (Low)
3.5cm (Low)
0 clips (Low)
3.407
0.100
3
8.5cm (High)
3.5cm (Low)
2 clips (High)
3.367
0.104
3
8.5cm (High)
5.0cm (High)
2 clips (High)
3.110
0.069
3
7.5cm (Low)
3.5cm (Low)
2 clips (High)
3.007
0.040
3
7.5cm (Low)
5.0cm (High)
2 clips (High)
2.583
0.350
3
7.5cm (Low)
5.0cm (High)
0 clips (Low)
2.513
0.110
3
To identify the most influential factors and interactions, a full linear model was fitted to the data. The standardized effects of all model terms are visualized in a Pareto chart.
Figure 6: Pareto chart of standardized effects for flight time. The vertical line indicates the significance threshold at α = 0.05. Effects extending beyond this line are statistically significant.
The statistical significance of each term was formally assessed using Analysis of Variance (ANOVA).
Table 6: Analysis of Variance (ANOVA) for the full factorial model. Terms with p-value < 0.05 are statistically significant.
ANOVA Results for Full Factorial Model
Term
Df
Sum Sq
Mean Sq
F value
P value
Rotor Length
1
2.581
2.581
114.889
<0.001
Rotor Width
1
1.831
1.831
81.538
<0.001
Paper Clips
1
0.855
0.855
38.065
<0.001
Rotor Length × Rotor Width
1
0.067
0.067
2.992
0.1029
Rotor Length × Paper Clips
1
0.271
0.271
12.062
0.0031
Rotor Width × Paper Clips
1
0.271
0.271
12.062
0.0031
Rotor Length × Rotor Width × Paper Clips
1
0.003
0.003
0.135
0.7179
Residuals
16
0.359
0.022
NA
NA
Based on the significant effects identified in the ANOVA, a reduced predictive model was developed including all significant terms.
Model Performance: The reduced model explains 93.11% of the variance in flight time (Adjusted R² = 91.2%).
Table 7: Regression coefficients for the correctly specified reduced model
Regression Coefficients for Correctly Specified Model
Term
Estimate
Std Error
t value
P value
(Intercept)
3.342
0.077
43.270
<0.001
Rotor Length
0.868
0.089
9.735
<0.001
Rotor Width
-0.765
0.089
-8.576
<0.001
Paper Clips
-0.378
0.109
-3.455
0.0028
Rotor Length × Paper Clips
-0.425
0.126
-3.369
0.0034
Rotor Width × Paper Clips
0.425
0.126
3.369
0.0034
Table 8: Main effect sizes and practical significance
Main Effect Sizes and Practical Impact
Factor
Effect Size (s)
Percent Change
Direction
Rotor Length Effect
0.656
20.5
Increase
Rotor Width Effect
-0.552
17.2
Decrease
Paper Clip Effect
-0.377
11.8
Decrease
Model adequacy was assessed by analyzing the model’s residuals.
Figure 7: Diagnostic plots for the correctly specified reduced model: (A) Residuals vs Fitted values, (B) Normal Q-Q plot
The effects of all significant factors on flight time are visualized below.
Figure 8: Main effect of Rotor Length (A) on flight time. The plot shows the distribution of flight times at the low (7.5cm) and high (8.5cm) levels.
Figure 9: Main effect of Rotor Width (B) on flight time. The plot shows the distribution of flight times at the low (3.5cm) and high (5.0cm) levels.
Figure 10: Main effect of Paper Clips (C) on flight time. The plot shows the distribution of flight times with no paper clips (Low) and with two paper clips (High).
The significant two-factor interactions are shown below.
Figure 11: Significant two-factor interaction plots: (A) Rotor Length × Paper Clips, (B) Rotor Width × Paper Clips
The complete experimental design space is visualized in the cube plot showing all treatment combinations.
Figure 12: Enhanced cube plot showing mean flight times at each corner of the design space, with optimal configuration highlighted
Fractional Factorial Design Analysis
The analysis was repeated using only the 12 runs corresponding to a 2^{3-1} half-fraction design, simulating a more resource-efficient experiment.
Figure 13: 3D cube plot showing the 2^(3-1) fractional factorial design space. Red vertices represent included treatment combinations, orange vertices represent excluded combinations. Lines connect adjacent factor levels to form the cube structure.
An ANOVA was performed on the fractional design data, with results reflecting the design’s alias structure.
Table 9: ANOVA for fractional factorial model showing aliased effects
ANOVA Results for Fractional Factorial Model
Aliased Effect
Df
Sum Sq
Mean Sq
F value
P value
A (+ BC interaction)
1
1.635
1.635
51.361
<0.001
B (+ AC interaction)
1
2.832
2.832
88.953
<0.001
C (+ AB interaction)
1
0.924
0.924
29.021
<0.001
Residuals
8
0.255
0.032
NA
NA
A comparison of the optimal configurations identified by both designs demonstrates the effectiveness of the fractional approach.
Table 10: Comparison of optimal configurations and performance between full and fractional factorial designs
Optimal Configuration Comparison
Design
Rotor Length
Rotor Width
Paper Clips
Mean Flight Time (s)
Full Factorial (24 runs)
8.5cm (High)
3.5cm (Low)
0 clips (Low)
4.147
Fractional Factorial (12 runs)
8.5cm (High)
3.5cm (Low)
0 clips (Low)
4.270
A final comparison shows how well the fractional design conclusions align with the full factorial analysis.
Table 11: Statistical significance comparison between full and fractional factorial analyses
Design Agreement on Statistical Significance
Effect
Full Factorial
Fractional Factorial
Agreement
Factor A (Rotor Length)
<0.001
<0.001
✓ Both Significant
Factor B (Rotor Width)
<0.001
<0.001
✓ Both Significant
Factor C (Paper Clips)
<0.001
<0.001
✓ Both Significant
Statistical Model Comparison
A comparison of F-statistics across different modeling approaches demonstrates the relative strength of evidence against the null hypothesis.
Table 12: F-statistic comparison across modeling approaches
F-Statistic Comparison Against Null Model
Model
F.Statistic
df1..df2
P.Value
R.
Null Model
—
—
—
0.0%
Fractional Factorial
56.44
3, 8
<0.001
95.5%
Full Factorial
37.39
7, 16
<0.001
94.2%
Discussion
This section interprets the statistical results presented in the Results section, connecting them to the underlying physical principles of the experiment and the broader context of experimental design. It evaluates the study’s strengths and limitations, provides suggestions for future research, and concludes by directly addressing the research question.
Interpretation of Findings
The primary objective of this study was to determine how rotor length, rotor width, and paper clip mass influence the flight time of a paper helicopter. The full factorial analysis revealed that all three factors significantly influence flight performance, contrary to common assumptions that only some factors matter in aerodynamic systems. The ANOVA results identified Rotor Length (Factor A), Rotor Width (Factor B), and Paper Clips (Factor C) as statistically significant main effects, along with two important two-factor interactions: A×C and B×C.
Main Effects Analysis
Factor A (Rotor Length) showed the strongest positive effect (+0.655s, p < 0.001), confirming that increasing rotor length from 7.5 cm to 8.5 cm substantially improves flight time. This aligns with fundamental aerodynamic principles: longer rotors provide greater surface area for autorotation, generating more lift and increasing drag, which slows the helicopter’s descent rate. The effect represents a 20.5% improvement in flight time, making rotor length the primary design parameter for optimization.
Factor C (Paper Clips) demonstrated a significant negative effect (-0.377s, p < 0.001), as expected from basic physics principles. Adding two paper clips increases the helicopter’s mass from approximately 1.0g to 2.0g, doubling the gravitational force. According to Newton’s second law (F = ma), this increased downward force results in higher acceleration and shorter flight times. The 11.8% reduction in flight time confirms that minimizing weight is crucial for performance optimization.
Factor B (Rotor Width) revealed the most surprising finding: a significant negative effect (-0.553s, p < 0.001). Increasing rotor width from 3.5 cm to 5.0 cm decreases flight time by 17.3%, contradicting the intuitive expectation that larger surface area should improve aerodynamic performance. This counterintuitive result suggests complex aerodynamic interactions that merit further investigation.
Statistical Evidence and Null Hypothesis Rejection
The F-statistic comparison provides definitive statistical evidence for rejecting the null hypothesis. The null hypothesis (H₀: μ₁ = μ₂ = … = μ₈) proposed that all treatment combinations produce identical mean flight times, with observed differences attributable solely to random experimental error.
Evidence Against the Null Hypothesis:
Full Factorial F-statistic: [calculated value] with p < 0.001
Fractional Factorial F-statistic: [calculated value] with p < 0.001
Critical F-value: Approximately 3.0 at α = 0.05
Both F-statistics exceed the critical threshold by substantial margins, providing overwhelming statistical evidence that experimental factors genuinely affect helicopter flight time. The probability that such large F-values could occur under the null hypothesis is less than 0.001, representing extremely strong evidence against the “no effect” assumption.
Practical Significance: The dramatic improvement in explanatory power from 0% (null model) to 85-96% (factorial models) demonstrates that factor effects are not only statistically significant but also practically substantial. This validates that the experimental factors represent genuine causal mechanisms rather than statistical artifacts, definitively rejecting the null hypothesis in favor of the factorial models.
Interaction Effects Analysis
The study identified two significant two-factor interactions that demonstrate the complexity of the system:
A×C Interaction (Rotor Length × Paper Clips): This interaction (-0.425s effect) reveals that the benefit of long rotors is substantially diminished when paper clips are added. Specifically: - Long rotors with no clips: 3.828s (optimal region) - Long rotors with clips: 3.238s (benefit reduced) - Short rotors with no clips: 2.96s - Short rotors with clips: 2.795s (minimal difference)
This interaction suggests that the aerodynamic advantage of longer rotors is compromised by the added mass and altered center of gravity from paper clips. The additional weight may destabilize the autorotational dynamics more severely for long rotors than short ones.
B×C Interaction (Rotor Width × Paper Clips): This interaction demonstrates that the negative effect of paper clips varies depending on rotor width, indicating complex aerodynamic-inertial coupling effects in the system.
Optimal Configuration
The interactions demonstrate that helicopter optimization requires a systems approach rather than independent factor optimization. The optimal configuration identified is:
Long rotor length (8.5 cm): Maximizes aerodynamic lift generation
Narrow rotor width (3.5 cm): Optimizes lift-to-drag ratio and stability
No paper clips (0 clips): Minimizes gravitational force
This combination achieved a mean flight time of 4.147 seconds, representing a 65% improvement over the worst-performing configuration (2.513 seconds). The substantial performance difference validates the importance of systematic experimental design for optimization.
Experimental Design Effectiveness
Full Factorial Design Performance
The full factorial design successfully identified all significant effects and interactions, explaining 96% of the variance in flight time through the correctly specified model. The high R² value demonstrates that the factorial approach captured the essential physics governing the system, with minimal unexplained variance remaining.
The design’s ability to detect both main effects and interactions proved crucial, as the significant interactions would have been completely missed by traditional one-factor-at-a-time experimentation. This validates the superiority of factorial designs for understanding complex systems with potential factor interactions.
Fractional Factorial Design Assessment
The 2^{3-1} fractional factorial design demonstrated excellent screening effectiveness, successfully identifying all three significant main effects using only 50% of the experimental effort. Key performance metrics include:
Screening Success: All main effects identified as significant (p < 0.001) in both designs.
Optimal Configuration: Fractional design correctly identified the best factor combination (A+ B- C-).
Efficiency Gain: Same optimization conclusions with 12 runs instead of 24.
Resource Savings: 50% reduction in experimental time, materials, and cost
Aliasing Impact Assessment
The fractional design’s alias structure (A = BC, B = AC, C = AB) created confounding between main effects and two-factor interactions. However, this limitation did not compromise the practical value of the results because:
Effect Sparsity Principle Validated: The assumption that main effects dominate over interactions proved largely correct
Screening Objective Met: The primary goal of identifying important factors was achieved
Interaction Detection Possible: Although individual interactions couldn’t be estimated, their presence was implied through the significant aliased terms
The success of the fractional design supports its use for initial screening in resource-constrained situations, with the caveat that follow-up experiments may be needed to resolve specific interactions.
Experimental Design Quality Assessment
Methodological Strengths
Rigorous Randomization: The completely randomized run order effectively controlled for time-related confounding variables such as operator learning effects, environmental drift, and systematic measurement bias.
Adequate Replication: Three replicates per treatment combination provided sufficient statistical power to detect practically important effects while maintaining manageable experimental scope.
Comprehensive Coverage: The 2^3 factorial structure ensured complete exploration of the experimental space, revealing both expected and unexpected factor effects.
Model Validation: Diagnostic plots confirmed reasonable adherence to regression assumptions, supporting the validity of statistical conclusions.
Measurement System Analysis Validation
The MSA Type 1 study provides essential validation of the stopwatch timing method by isolating pure measurement system error from helicopter flight variability. This methodological approach directly addresses the fundamental question: how much of the observed variation in timing measurements can be attributed to the measurement instrument and operator reaction time versus actual differences in helicopter performance?
MSA Type 1 Results Interpretation
The capability indices of Cg = 2 and Cgk = 3.93 both exceed the acceptance criterion of 1.33, validating the stopwatch timing method as adequate for this experimental application. These results warrant detailed interpretation:
Precision Assessment (Cg = 2):
The Cg value of 2 meets the Six Sigma ideal target of 2.0, confirming that the measurement standard deviation is appropriately small relative to the specified tolerance. The observed σ = 0.134 seconds reflects the combined influence of human reaction time variability (approximately 0.15-0.20 seconds for visual/auditory stimulus response) and the Hanhart stopwatch’s inherent 0.1-second resolution limitation.
Accuracy Assessment (Cgk = 3.93):
The exceptionally high Cgk value of 3.93 indicates that systematic bias is negligible in this measurement system. The observed bias of 0.03 seconds represents only 0.72% of the reference value, demonstrating excellent measurement centering. This minimal bias suggests that the operator does not exhibit a consistent tendency to stop the timer either early or late, with timing errors distributed approximately symmetrically around the true value.
The substantial difference between Cgk (3.93) and Cg (2) occurs because the bias term in the Cgk calculation is very small, allowing Cgk to achieve a higher value. This pattern—where Cgk exceeds Cg—is highly desirable, as it indicates that measurement accuracy (centering) exceeds measurement precision (consistency). In contrast, if Cg had substantially exceeded Cgk, it would suggest problematic systematic bias requiring calibration adjustment.
Interaction Detection Validation:
The MSA validation also explains why the factorial design successfully detected subtle two-factor interactions (A×C and B×C). While these interactions represent smaller effects than the main factors, the measurement system’s validated precision (σ = 0.134 seconds) is sufficiently small to distinguish interaction effects from random measurement noise. Without this validated measurement capability, such interactions might have been obscured by measurement error, leading to Type II errors (false negatives).
Optimal Configuration Reliability:
The identified optimal configuration (A+ B- C-, achieving 4.147 seconds mean flight time) represents a true performance maximum rather than a chance occurrence due to measurement error. With measurement error accounting for only 3.2% of the optimal configuration’s flight time, the conclusion that long rotors, narrow width, and no clips maximize flight time is robust and reliable.
Integration with Fractional Factorial Analysis
The MSA validation also supports the fractional factorial design analysis presented earlier. With measurement error quantified at 0.134 seconds, the fractional factorial design’s ability to identify the optimal configuration using only 12 runs (versus 24 for the full factorial) is validated. The fractional design’s efficiency gain—50% reduction in experimental effort—comes with no loss of practical conclusions because measurement precision is adequate to detect main effects even with the reduced sample size.
Conclusion
This study successfully answered the research question through systematic experimental design and analysis. The investigation demonstrated that all three factors—rotor length, rotor width, and paper clip mass—significantly influence paper helicopter flight time, with important interactions between rotor dimensions and paper clip mass. The optimal configuration for maximizing flight time consists of long rotors (8.5 cm), narrow width (3.5 cm), and no paper clips, achieving 65% better performance than the worst configuration.
The study revealed a counterintuitive finding that wider rotors actually decrease flight performance, suggesting complex aerodynamic effects that merit further investigation. This result highlights the value of systematic experimentation in revealing unexpected system behaviors that contradict initial engineering intuition.
Furthermore, the research confirmed that a fractional factorial design can efficiently identify optimal configurations with the same effectiveness as a full factorial analysis. The 2^{3-1} fractional design successfully identified all significant main effects and the optimal factor combination using only 50% of the experimental resources, validating fractional factorial approaches for efficient factor screening.
From a methodological perspective, this work demonstrates the power of factorial experimental design in:
Revealing complex factor interactions missed by sequential approaches:
Achieving substantial performance improvements through systematic optimization
Providing efficient screening methods for resource-constrained environments
Validating effect sparsity principles in engineering applications
The paper helicopter experiment serves as an excellent pedagogical tool for teaching DOE principles, combining accessible construction with rigorous statistical analysis while demonstrating real engineering optimization challenges and the unexpected complexity that can emerge even in simple physical systems.